Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning

نویسندگان

چکیده

This paper deals with distributed reinforcement learning problems safety constraints. In particular, we consider that a team of agents cooperate in shared environment, where each agent has its individual reward function and constraints involve all agents' joint actions. As such, the aim to maximize team-average long-term return, subject More intriguingly, no central controller is assumed coordinate agents, both rewards are only known locally/privately. Instead, connected by peer-to-peer communication network share information their neighbors. this work, first formulate problem as constrained Markov decision process (D-CMDP) networked agents. Then, propose decentralized policy gradient (PG) method, Safe Dec-PG, perform optimization based on D-CMDP model over network. Convergence guarantees, together numerical results, showcase superiority proposed algorithm. To best our knowledge, PG algorithm accounts for coupled quantifiable convergence rate multi-agent learning. Finally, emphasize also novel solving class stochastic nonconvex-concave minimax problems, design corresponding theoretical analysis independent interest.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning

In reinforcement learning, agents learn by performing actions and observing their 1 outcomes. Sometimes, it is desirable for a human operator to interrupt an agent 2 in order to prevent dangerous situations from happening. Yet, as part of their 3 learning process, agents may link these interruptions, that impact their reward, to 4 specific states and deliberately avoid them. The situation is pa...

متن کامل

Multi-agent Learning and the Reinforcement Gradient

The number of proposed reinforcement learning algorithms appears to be ever-growing. This article tackles the diversification by showing a persistent principle in several independent reinforcement learning algorithms that have been applied to multi-agent settings. While their learning structure may look very diverse, algorithms such as Gradient Ascent, Cross learning, variations of Q-learning a...

متن کامل

Gradient Descent for General Reinforcement Learning

Andrew Moore [email protected] www.cs.cmu.edu/-awm Computer Science Department 5000 Forbes Avenue Carnegie Mellon University Pittsburgh, PA 15213-3891 A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learn...

متن کامل

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

Autonomous driving is a multi-agent setting where the host vehicle must apply sophisticated negotiation skills with other road users when overtaking, giving way, merging, taking left and right turns and while pushing ahead in unstructured urban roadways. Since there are many possible scenarios, manually tackling all possible cases will likely yield a too simplistic policy. Moreover, one must ba...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i10.17062